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Abstract 

In  this  paper  three  problems  related  to  the  analysis  of  facial  images  are  addressed:  the  estimation  of 
the  illuminant  direction,  the  compensation  of  illumination  effects  and,  finally,  the  recovery  of  the  pose  of 
the  face,  restricted  to  in-depth  rotations.  The  solutions  proposed  for  these  problems  rely  on  the  use  of 
computer  graphics  techniques  to  provide  images  of  faces  under  different  illumination  and  pose,  starting 
from  a  database  of  frontal  views  under  frontal  illumination. 
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1  Introduction 

Automated  face  perception  (localization,  recognition 
and  coding)  is  now  a  very  active  research  topic  in  the 
computer  vision  community.  Among  the  reasons,  the 
possibility  of  building  applications  on  top  of  existing  re¬ 
search  is  probably  one  of  the  most  important.  While 
recent  results  on  localization  and  recognition  open  the 
way  to  automated  security  systems  based  on  face  identi¬ 
fication,  breakthroughs  in  the  field  of  facial  image  coding 
are  of  practical  interest  for  teleconferencing  and  database 
applications. 

In  this  paper  three  tasks  will  be  addressed:  learn¬ 
ing  the  direction  of  illuminant  for  frontal  views  of  faces, 
compensating  for  non  frontal  illumination  and,  finally, 
estimating  the  pose  of  a  face,  limited  to  in-depth  rota¬ 
tions.  The  solutions  we  propose  to  these  tasks  share  two 
important  aspects:  the  use  of  learning  techniques  and 
the  synthesis  of  the  examples  used  in  the  learning  stage. 
Learning  an  input/output  mapping  from  examples  is  a 
powerful  general  mechanism  of  problem  solving  once  a 
suitably  large  number  of  meaningful  examples  is  avail¬ 
able.  Unfortunately,  gathering  the  needed  examples  is 
often  a  time  consuming,  expensive  process.  Yet,  the  use 
of  a-priori  knowledge  can  help  in  creating  new,  valid,  ex¬ 
amples  from  a  (possibly  limited)  available  set.  In  this 
paper  we  use  a  rough  model  of  the  3D  head  structure 
to  generate  from  a  single,  frontal,  view  of  a  face  under 
uniform  illumination,  a  set  of  views  under  different  poses 
and  illumination  using  ray-tracing  and  texture  mapping 
techniques.  The  resulting  extended  sets  of  examples  will 
be  used  for  solving  the  addressed  problems  using  learn¬ 
ing  techniques. 

2  Learning  the  illuminant  direction 

In  this  section  the  computation  of  the  direction  of  the 
illuminant  is  considered  as  a  learning  task  (see  [1,  2,  3,  4] 
for  other  approaches).  The  images  for  which  the  direc¬ 
tion  must  be  computed  are  very  constrained:  they  are 
frontal  views  of  faces  with  a  fixed  interocular  distance  [5]. 
Once  the  illuminant  direction  is  known  it  can  be  compen¬ 
sated  for,  obtaining  an  image  under  standard  illumina¬ 
tion  which  can  be  more  easily  compared  to  a  database  of 
faces  using  standard  techniques  such  as  cross-correlation. 
Let  us  introduce  a  very  simple  lighting  model  [6]: 

I  —  {A  +  L6  cos  uj) A  (1) 

where  I  represents  the  emitted  intensity,  A  is  the  am¬ 
bient  energy,  6  is  1  if  the  point  is  visible  from  the  light 
source  and  0  otherwise,  oj  is  the  angle  between  the  inci¬ 
dent  light  and  the  surface  normal,  A  is  the  surface  albedo 
and  L  is  the  intensity  of  the  directional  light.  Let  us 
assume  that  a  frontal  image  I  a  of  a  face  under  diffuse 
ambient  lighting  (L  =  ^  —  0)  is  available: 

Ia  =  AA  (2) 

The  detected  intensity  is  then  proportional  to  the  sur¬ 
face  albedo.  Let  us  now  assume  that  a  3D  model  of  the 
same  face  is  available.  The  corresponding  surface  can  be 
easily  rendered  using  ray-tracing  techniques  if  the  light 
sources  and  the  surface  albedo  are  given.  In  particular, 


we  can  consider  a  constant  surface  albedo  Aq  and  use  a 
single,  directional,  light  source  of  intensity  L  in  addition 
to  an  appropriate  level  of  ambient  light  By  changing 
the  direction  i?  =  {0,  0)^  of  the  emitted  light,  the  corre¬ 
sponding  synthetic  image  S{0,  (j),  A^)  can  be  computed: 

S{6,  (t>,A^)  =  (A'  -h  L8  cosa;)Ao  (3) 

Using  the  albedo  information  Ia  from  the  real  image,  a 
set  of  images  I{0,  (j),  A^)  can  be  computed: 

I{0,  A')  =  A')1a  oc  S{e,  <!>,  A')1a  (4) 

In  the  following  paragraphs  it  will  be  shown  that  even  a 
very  crude  3D  model  of  a  head  can  be  used  to  generate 
images  for  training  a  network  that  learns  the  direction  of 
the  illuminant.  The  effectiveness  of  the  training  will  be 
demonstrated  by  testing  the  trained  network  on  a  set  of 
real  images  of  a  different  face.  The  resulting  estimates 
are  in  good  quantitative  agreement  with  the  data. 

From  a  rather  general  point  of  view  the  problem  of 
learning  can  be  considered  as  a  problem  of  function  re¬ 
construction  from  sparse  data  [7].  The  points  at  which 
the  function  value  is  known  represent  the  examples  while 
the  function  to  be  reconstructed  is  the  input/output  de¬ 
pendence  to  be  learned.  If  no  additional  constraints  are 
imposed,  the  problem  is  ill  posed.  The  single,  most  im¬ 
portant  constraint  is  that  of  smoothness:  similar  inputs 
should  be  mapped  into  similar  outputs.  Regularization 
theory  formalizes  the  concept  and  provide  techniques  to 
select  appropriate  family  of  mappings  among  which  an 
approximation  to  the  unknown  function  can  be  chosen. 
Let  us  consider  the  reconstruction  of  a  scalar  function 
y  =  f{x):  the  vector  case  can  be  solved  by  considering 
each  component  in  turn.  Given  a  parametric  family  of 
mappings  G{x\oi)  and  a  set  of  examples  {{xi,yi)}  the 
function  which  minimizes  the  following  functional  is  cho¬ 
sen: 

-p(®i))^  (5) 

i 

where  yi  —  j[xi)  a,nd  p{xi)  represent  a  polynomial  term 
related  to  the  regularization  constraints.  A  common 
choice  for  the  family  G  is  that  of  linear  superposition 
of  translates  of  a  single  function  such  as  the  Gaussian: 

Gix;  {cj},  {t,-},  W)  =  J2  (6) 

i 


where  W'^W  is  a  positive  definite  matrix  representing 
a  metric  (the  polynomial  term  is  not  required  in  this 
case).  The  resulting  approximation  structure  can  also 
be  considered  as  an  HyperBF  network  (see  [7]  for  further 
details).  In  the  task  of  learning  the  illuminant  direction 
we  would  like  to  associate  the  direction  of  the  light  source 
to  a  vector  of  measurements  derived  from  a  frontal  image 
of  a  face. 

In  order  to  describe  the  intensity  distribution  over  the 
face,  the  central  region  (see  Figure  1)  was  divided  into 
four  patches,  each  one  represented  by  an  average  inten¬ 
sity  value  computed  using  Gaussian  weights  (see  Figure 
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^The  angles  0  and  0  correspond  to  left- right  and  top-down 


displacement  respectively. 


Speolai^.,. 


2).  The  domain  of  the  examples  is  then  'RJ^.  Each  input 
vector  X  is  normalized  to  length  1  making  the  input  vec¬ 
tors  independent  from  scaling  of  the  image  intensities  so 
that  the  set  of  images  {1{0,  <j)^  ^0}^^  replaced  by 

{5(0,0,  AO/a 

The  normalization  is  necessary  as  it  is  not  the  global 
light  intensity  which  carries  information  on  the  direction 
of  the  light  source,  but  rather  its  spatial  distribution.  Us¬ 
ing  a  single  image  under  (approximately)  diffuse  lighting, 
a  set  of  synthetic  images  was  computed  using  eqn.  (3). 
A  rough  3D  model  of  a  polystyrene  mannequin  head  was 
used  to  generate  the  constant  albedo  images^.  The  direc¬ 
tion  of  the  illuminant  spanned  the  range  0,  0  G  [~60,  60] 
with  the  examples  uniformly  spaced  every  5  degrees.  The 
illumination  source  used  for  ray  tracing  was  modeled  to 
match  the  environment  in  which  the  test  images  were 
acquired  (low  ambient  light  and  a  powerful  studio  light 
with  diffuser). 

From  each  image  5(0,  0,  A')/a  an  example  (a;^^,  0)  is 
computed.  The  resulting  set  of  examples  is  divided  into 
two  subsets  to  be  used  for  training  and  testing  respec¬ 
tively.  The  use  of  two  independent  subsets  is  important 
as  it  allows  to  check  for  the  phenomenon  of  overfitting 
usually  related  to  the  use  of  a  network  which  has  too 
many  free  parameters  for  the  available  set  of  examples. 
Experimentation  with  several  network  structures  showed 
that  a  HyperBF  network  with  4  centers  and  a  diagonal 
metric  is  appropriate  for  this  task.  A  second  network 
is  built  using  the  examples  0)}^,;i>-  The  networks 

are  trained  separately  using  a  stochastic  algorithm  with 
adaptive  memory  [8]  for  the  minimization  of  the  global 
square  error  of  the  corresponding  outputs: 

Ee{oc)  = 

&(!> 

E4,{a.)  =  y^{<j)~G^{xgf,oc)f 

S4> 

The  error  Ee  for  the  different  values  of  the  illuminant 
direction  is  reported  in  Figure  4.  The  network  trained 
on  the  0  angle  is  then  tested  on  a  set  of  four  real  images 
for  which  the  direction  of  the  illuminant  is  known  (see 
Figure  5).  The  response  of  the  network  is  reported  in 
Figure  6  and  is  in  good  agreement  with  the  true  values. 

Once  the  direction  of  the  illuminant  is  known,  the 
synthetic  images  can  be  used  to  correct  for  it,  providing 
an  image  under  standard  (e.g.  frontal)  illumination.  The 
next  section  details  a  possible  strategy. 

3  Illumination  compensation 

Once  the  direction  of  the  illuminant  is  computed,  the 
image  can  be  corrected  for  it  and  transformed  into  an 
image  under  standard  illumination,  e.g.  frontal.  The 
compensation  can  proceeds  along  the  following  steps: 

1.  compute  the  direction  (0,0)  of  the  illuminant; 


^A  public  domain  rendering  package,  Ray  shade  4-^1  by 
Craig  Kolb,  was  used. 


2.  establish  a  pixel  to  pixel  correspondence  between 
the  image  to  be  corrected  A'  and  the  reference  im¬ 
age  used  to  create  the  examples  I  a 

i^x^yx)  ^  (/) 

3.  generate  a  view  /(0,0,A)  of  the  reference  image 
under  the  computed  illumination; 

4.  compute  the  transformation  due  to  the  change  in 
illumination  between  I  a  and  /(0,0,A): 

A(a;,  y)  =  Ia{x,  y)  -  /(0,  0,  A;  x,  y)  (8) 

5.  apply  the  transformation  A  to  image  X  by  using 
the  correspondence  map  A4  in  the  following  way 
[9]: 

X(x,  y)  -+  X{x,  y)  +  A{Mxix,  y),  My{x,  y))  (9) 

The  pixel  to  pixel  correspondence  M  can  be  computed 
using  optical  flow  algorithms  [10,  11,  12,  13].  However, 
in  order  to  use  such  algorithms  effectively,  it  is  often  nec¬ 
essary  to  pre- adjust  the  geometry  of  image  X  to  that  of 
I  A  [14].  This  can  be  done  by  locating  relevant  features 
of  the  face,  such  as  the  nose  and  mouth,  and  warping 
image  X  so  that  the  location  of  these  features  is  the 
same  as  in  the  reference  image.  The  algorithm  for  lo¬ 
cating  the  warping  features  should  not  be  sensitive  to 
the  illumination  under  which  the  images  are  taken  and 
should  be  able  to  locate  the  features  without  knowing 
the  identity  of  the  represented  person.  The  usual  way 
to  locate  a  pattern  within  an  image  is  to  search  for  the 
maximum  of  the  normalized  cross-correlation  coefficient 
Pxy  [15].  The  sensitivity  of  this  coefficient  to  changes  in 
the  illumination  can  be  reduced  by  a  suitable  processing 
of  the  images  prior  to  the  comparison  (see  Appendix  A). 
Furthermore,  the  identity  of  the  person  in  the  image  is 
usually  unknown  so  that  the  features  should  be  located 
using  generic  templates  (a  possible  strategy  is  reported 
in  Appendix  B).  After  locating  the  nose  and  mouth  the 
whole  face  is  divided  into  four  rectangles  with  sides  par¬ 
allel  to  the  image  boundary:  from  the  eyes  upwards, 
from  the  eyes  to  the  nose  base,  from  the  nose  base  to 
the  mouth  and  from  the  mouth  downwards.  The  two  in¬ 
ner  rectangles  are  stretched  (or  shrunk)  vertically  so  that 
the  nose  and  mouth  are  aligned  to  the  corresponding  fea¬ 
tures  of  the  reference  image  Ja-  The  lowest  rectangle  is 
then  modified  accordingly.  The  image  contents  are  then 
mapped  using  the  rectangles  affine  transformations  and 
a  hierarchical  optical  flow  algorithm  is  used  to  build  a 
correspondence  map  at  the  pixel  level.  The  transforma¬ 
tions  are  finally  composed  to  compute  the  map  M  and 
image  X  can  be  corrected  according  to  eqns.  (8-9).  One 
of  the  examples  previously  used  is  reported  in  Figure  8 
under  the  original  illumination  and  under  the  standard 
one  obtained  with  the  described  procedure. 

4  Pose  Estimation 

In  this  section  we  present  an  algorithm  for  estimating 
the  pose  of  a  face,  limited  to  in  depth  rotations.  The 
knowledge  of  the  pose  can  be  of  interest  both  for  recog¬ 
nition  systems,  where  an  appropriate  template  can  then 


be  chosen  to  speed  up  the  recognition  process  [14],  and 
for  model  based  coding  systems,  such  as  those  which 
could  be  used  for  teleconferencing  applications  [16].  The 
idea  underlying  the  proposed  algorithm  for  pose  estima¬ 
tion  is  that  of  quantifying  the  asymmetry  between  the 
aspect  of  the  two  eyes  due  to  in-depth  rotation  and  map¬ 
ping  the  resulting  value  to  the  amount  of  rotation.  It  is 
possible  to  visually  estimate  the  in-depth  rotation  even 
when  the  eyes  are  represented  schematically  such  as  in 
some  cartoons  characters  where  eyes  are  represented  by 
small  bars.  This  suggests  that  the  relative  amount  of 
gradient  intensity,  along  the  mouth-forehead  direction, 
in  the  regions  corresponding  to’  the  left  and  right  eye 
respectively  provides  enough  information  for  estimating 
the  in-depth  rotation  parameter. 

The  algorithm  requires  that  the  location  of  one  of  the 
eyes  is  approximately  known  as  well  as  the  direction  of 
the  interocular  axis.  Template  matching  techniques  such 
as  those  outlined  in  Appendix  B  can  be  used  to  locate 
one  of  the  eyes  even  under  large  left-right  rotations  and 
the  direction  of  the  interocular  axis  can  be  computed  us¬ 
ing  the  method  reported  in  [17].  Let  us  assume  for  sim¬ 
plicity  of  notation  that  the  interocular  axis  is  horizontal. 
Using  the  projection  techniques  reported  in  [18,  5]  we 
can  approximately  localize  the  region  were  both  eyes  are 
confined.  For  each  pixel  in  the  region  the  following  map 
is  computed: 

,,)  _  /  ldyC(x,  2/)|  if  ldyC(x,  y)\  >  \d^C{x,  y)\ 
otherwise 

(10) 

where  C{x^y)  represent  the  local  contrast  map  of  the 
image  computed  according  to  eqn.  (13)  (see  Appendix 
A).  The  resulting  map  assigns  a  positive  value  to  pix¬ 
els  where  the  projection  of  gradient  along  the  mouth- 
forehead  direction  dominates  over  the  projection  along 
the  interocular  axis.  In  order  to  estimate  the  asymmetry 
of  the  two  eyes  it  is  necessary  to  determine  the  regions 
corresponding  to  the  left  and  right  eye  respectively.  This 
can  be  done  by  computing  the  projection  P{x)  of  U(x,  y) 
on  the  horizontal  axis  given  by  the  sum  of  the  values  in 
each  of  the  columns.  The  analysis  of  the  projections  is 
simplified  if  they  are  smoothed:  in  our  experiments  a 
Gaussian  smoother  was  used.  The  resulting  projections, 
at  different  rotations  are  reported  in  Figure  9.  These 
data  are  obtained  by  rotating  the  same  3D  model  used 
for  the  generation  of  the  illumination  examples:  texture 
mapping  techniques  are  then  used  to  project  a  frontal 
view  of  a  face  onto  the  rotated  head  (see  [13,  19]  for  al¬ 
ternative  approaches  to  the  estimation  of  pose  and  syn¬ 
thesis  of  non  frontal  views). 

The  figure  clearly  shows  that  the  asymmetry  of  the 
two  peaks  increases  with  the  amount  of  rotation.  The 
asymmetry  U  can  be  quantified  by  the  following  quan¬ 
tity: 

-P(^)  -  Pj^)  .... 
~  Pi^)  +  Pi^)  ^  ^ 

where  Xm  is  the  coordinate  of  the  minimum  between  the 
two  peaks.  The  value  of  7/  as  a  function  of  the  angle  of 
rotation  is  reported  in  Figure  10.  Using  the  approximate 
linear  relation  it  is  possible  to  quantify  the  rotation  of  a 


new  image.  The  pose  recovered  by  the  described  algo¬ 
rithm  from  several  images  is  reported  in  Figure  11  where 
for  each  of  the  testing  images  a  synthetic  image  with  the 
corresponding  pose  is  shown. 


5  Conclusions 

In  this  paper  three  problems  related  to  the  analysis  of 
facial  images  have  been  addressed:  the  estimation  of  the 
illuminant  direction,  the  compensation  of  illumination 
effects  and,  finally,  the  recovery  of  the  pose  of  the  face, 
restricted  to  left-right  rotations.  The  solutions  proposed 
for  these  problems  rely  on  the  use  of  computer  graphics 
techniques  to  provide  images  of  faces  under  different  il¬ 
lumination  and  pose  starting  from  a  database  of  frontal 
views  under  frontal  illumination.  The  algorithms  trained 
using  synthetic  images  have  been  successfully  applied  to 
real  images. 
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A  Illumination  sensitivity 

A  common  measure  of  the  similarity  of  visual  patterns, 
represented  as  vectors  or  arrays  of  numbers,  is  the  nor¬ 
malized  cross  correlation  coefficient: 


pxy  — 


i^xy 

PxxPyy 


(12) 


where  /i^y  represent  the  second  order,  centered  moments. 

'  The  value  of  \pxy  \  is  equal  to  1  if  the  components  of 
two  vectors  are  the  same  modulo  a  linear  transforma¬ 
tion.  While  the  invariance  to  linear  transformation  of 
the  patterns  is  clearly  a  desirable  property  (automatic 
gain  and  black  level  adjustment  of  many  cameras  in¬ 
volve  such  a  linear  transformation)  it  is  not  enough  to 
cope  with  the  more  general  transformations  implied  by 
changes  of  the  illumination  sources.  A  common  approach 
to  the  solution  of  this  problem  is  to  process  the  visual 
patterns  before  the  estimation  of  similarity  is  done,  in  or¬ 
der  to  preserve  the  necessary  information  and  eliminate 
the  unwanted  details.  A  common  preprocessing  opera¬ 
tion  is  that  of  computing  the  intensity  of  the  brightness 
gradient  and  use  the  resulting  map  for  the  comparison 
of  the  patterns.  Another  preprocessing  operation  is  that 
of  computing  the  local  contrast  of  the  image.  A  possible 
definition  is  the  following: 


where 


r  c'  if  c"  <  1 

\  2  -  ^  if  C'  >  1 


C  ^ 


I 

I  *  A'G(ff) 


(13) 

(14) 


and  A'g(<t)  is  a  Gaussian  kernel  whose  a  is  related  to  the 
expected  interocular  distance.  It  is  important  to  note 
that  C  saturates  in  region  of  high  and  low  local  contrast 
and  is  consequently  less  sensitive  to  noise. 

Recently  some  claims  have  been  made  that  the  gra¬ 
dient  direction  field  has  good  properties  of  invariance  to 


changes  in  the  illumination  [20].  In  the  case  of  the  di¬ 
rection  field,  where  a  vector  is  associated  to  each  single 
pixel  of  the  image,  the  similarity  can  be  computed  by 
measuring  the  alignment  of  the  gradient  vectors  at  each 
pixel.  Let  g^{x,y)  and  g2{x,y)  be  the  gradient  fields  of 
the  two  images  and  ||  *  ||  represent  the  usual  vector  norm. 
The  global  alignment  can  be  defined  by 


A 


E 


w(x,p) 


(15) 
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where 

y)  =  ^(lbi(®.  y)\\  +  \\92i^r  2/)ll)  (16) 

The  formula  is  very  similar  to  the  one  used  in  [20]  (a  nor¬ 
malization  factor  has  been  added).  The  following  prepro¬ 
cessing  operators  were  compared  using  either  the  nor¬ 
malized  cross-correlation-coefficient  p  or  the  alignment 

plain:  the  original  brightness  image  convolved  with  a 
Gaussian  kernel  of  width  cr; 

contrast:  each  pixel  is  represented  by  the  local  image 
contrast  as  given  by  eqn.(13); 

gradient:  each  pixel  is  represented  by  the  brightness 
gradient  intensity  computed  after  convolving  the 
image  with  a  Gaussian  kernel  of  standard  deviation 
cr: 

||V(iV,*I(x,y))||  (17) 

gradient  direction:  each  pixel  is  represented  by  the 
brightness  gradient  of  I{x,y).  The  similarity 
is  estimated  through  the  coefficient  A  of  eqn.(16) 

laplacian:  each  pixel  is  represented  by  the  value  of  the 
laplacian  operator  applied  to  the  intensity  image 
convolved  with  a  Gaussian  kernel. 

For  each  of  the  preprocessing  operators,  the  similarity  of 
the  original  image  under  (nearly)  diffuse  illumination  to 
the  synthetic  images  obtained  through  eqn.  (4)  was  com¬ 
puted.  The  corresponding  average  values  are  reported 
in  Figure  12  for  different  values  of  the  parameter  a  of 
the  preprocessing  operators.  The  local  contrast  operator 
turns  out  to  be  the  less  sensitive  to  variations  in  the  il- 
luminant  direction.  It  is  also  worth  mentioning  that  the 
minimal  sensitivity  is  achieved  for  an  intermediate  value 
of  cr:  this  should  be  compared  to  the  monotonic  behav¬ 
ior  of  the  other  operators.  Further  experiments  with 
the  template-based  face  recogntion  system  described  in 
[5]  have  practically  demonstrated  the  advantage  of  using 
the  local  contrast  images  for  the  face  recognition  task. 


B  Alternative  Template  Matching 

The  correlation  coefficient  is  quite  sensitive  to  noise  and 
alternative  estimators  of  pattern  similarity  may  be  pre¬ 
ferred.  Such  measures  can  be  derived  from  distances 


other  than  the  Euclidean,  such  as  the  Li  norm  defined 
by: 

n 

<^i(x,y)  =  ki  -  2/i|  (18) 

i=l 

where  n  is  the  dimension  of  the  considered  vectors.  A 
similarity  measure  based  on  the  Li  norm  can  be  intro¬ 
duced: 


/(x',y') 


A 

Wi\  +  l2/(l/ 


(19) 


that  satisfies  the  following  relations: 

/(x',y')  €  [0,1] 

/(x',y')  =  l  x'  =  y' 

/(x',y0  =  0  ^  x'  =  -y' 

where  x'  and  y'  are  normalized  to  have  zero  average  and 
unit  variance.  The  characteristics  of  this  similarity  mea¬ 
sure  are  extensively  discussed  in  [21]  where  it  is  shown 
that  it  is  less  sensitive  to  noise  than  pxy  and  technically 
robust  [22].  Hierarchical  approaches  to  the  computation 
of  correlation,  such  as  those  proposed  in  [23]  are  readily 
extended  to  the  use  of  this  alternative  coefficient. 

The  influence  of  template  shape  can  be  further  re¬ 
duced  by  slightly  modifying  l{x,y).  Let  us  assume  that 
the  template  T  and  the  corresponding  image  patch  are 
normalized  to  zero  average  and  unit  variance.  We  de¬ 
note  by  f^/(x)  a  the  4-connected  neighborhood  of  point 
X  in  image  I  and  F'n/(x)(^)  the  intensity  value  in  fi/(x) 
whose  absolute  difference  from  w  is  minimum:  if  two 
values  qualify,  their  average  (u;)  is  returned.  A  modified 
l{x^y)  can  then  be  introduced: 

l'(  ^  l^aKx+y)-mx))r(x)|\ 

|FnHx+y)|  +  |(T(x))T(x)|; 


The  new  coefficient  introduces  the  possibility  of  local  de¬ 
formation  in  the  computation  of  similarity  (see  also  [24] 
for  an  alternative  approach). 


References 

[1]  A.  P.  Pentland.  Local  Shading  Analysis.  In  From 
Pixels  to  Predicates^  chapter  3.  Ablex  Publishing 
Corporation,  1986. 

[2]  A.  Shashua.  Illumination  and  View  Position  in  3D 
Visual  Recognition.  In  Advances  in  Neural  Informa¬ 
tion  Processing  Systems  L  pages  572-577.  Morgan 
Kaufmann,  1992. 

[3]  P.  W.  Hallinan.  A  Low-Dimensional  Representation 
of  Human  Faces  For  Arbitrary  Lighting  Conditions. 
Technical  Report  93-6,  Harvard  Robotics  Lab,  De¬ 
cember  1993. 

[4]  A.  Shashua.  On  Photometric  Issues  in  3D  Visual 
Recognition  From  A  Single  2D  Image.  International 
Journal  of  Computer  Vision,  1994.  to  appear. 

[5]  R.  Brunelli  and  T.  Poggio.  Face  Recognition: 
Features  versus  Templates.  IEEE  Transactions 
on  Pattern  Analysis  and  Machine  Intelligence, 
15(10):1042-1052,  1993. 


4 


[6]  J-P.  Thirion.  Realistic  3d  simulation  of  shapes  and 
shadows  for  image  processing.  Computer  Vision, 
Graphics  and  Image  Processing:  Graphical  Models 
and  Image  Processing,  54(l):82-90,  1992. 

[7]  T.  Poggio  and  F.  Girosi.  Networks  for  Approxima¬ 
tion  and  Learning.  In  Proc.  of  the  IEEE,  VoL  78, 
pages  1481-1497,  1990. 

[8]  R.  Brunelli  and  G.  Tecchiolli.  Stochastic  minimiza¬ 
tion  with  adaptive  memory.  Technical  Report  9211- 
14,  I.R.S.T,  1992.  To  appear  on  Journal  of  Compu¬ 
tational  and  Applied  Mathematics. 

[9]  T.  Poggio  and  R.  Brunelli.  A  Novel  Approach  to 
Graphics.  A.I.  Memo  No.  1354,  Massachusetts  In¬ 
stitute  of  Technology,  1992. 

[10]  B.  D.  Lucas  and  T.  Kanade.  An  iterative  image 
registration  technique  with  an  application  to  stereo 
vision.  In  Morgan-Kauffman,  editor,  Proc.  IJCAI, 
1981. 

[11]  J.  R.  Bergen  and  R.  Hingorani.  Hierarchical,  com¬ 
putationally  efficient  motion  estimation  algorithm. 
Journal  of  The  Optical  Society  of  America,  4:35, 
1987. 

[12]  J.  R.  Bergen  and  R.  Hingorani.  Hierarchical  motion- 
based  frame  rate  conversion.  Technical  report, 
David  SarnofF  Research  Center,  1990. 

[13]  D.  J.  Beymer,  A.  Shashua,  and  T.  Poggio.  Example 
Based  Image  Analysis  and  Synthesis.  A.I.  Memo 
No.  1431,  Massachusetts  Institute  of  Technology, 
1993. 

[14]  David  J.  Beymer.  Face  Recognition  under  Varying 
Pose.  A.I.  Memo  No.  1461,  Massachusetts  Institute 
of  Technology,  1993. 

[15]  D.  H.  Ballard  and  C.  M.  Brown.  Computer  Vision. 
Prentice  Hall,  Englewood  Cliffs,  NJ,  1982. 

[16]  K.  Aizawa,  H.  Harashima,  and  T.  Saito.  Model- 
based  analysis  synthesis  image  coding  (mbasic)  sys¬ 
tem  for  a  person’s  face.  Signal  Processing  Image 
Communication,  1:139-152,  1989. 

[17]  W.  T.  Freeman  and  Edward  H.  Adelson.  The  De¬ 
sign  and  Use  of  Steerable  Filters.  IEEE  Transac¬ 
tions  on  Pattern  Analysis  and  Machine  Intelligence, 
13(9):891-906,  September  1991. 

[18]  R.  Brunelli.  Edge  projections  for  facial  feature  ex¬ 
traction.  Technical  Report  9009-12,  I.R.S.T,  1990. 

[19]  A.  Shashua  and  S.  Toelg.  The  Quadric  Refer¬ 
ence  Surface:  Applications  in  Registering  Views  of 
Complex  3D  Objects.  Technical  Report  CAR-TR- 
702,  Center  for  Automation  Research,  University  of 
Maryland,  1994. 

[20]  Martin  Bichsel.  Strategies  of  Robust  Object  Recog¬ 
nition  for  the  Identification  of  Human  Faces.  PhD 
thesis,  Eidgenossischen  Technischen  Hochschule, 
Zurich,  1991. 

[21]  R.  Brunelli  and  S.  Messelodi.  Robust  Estimation 
of  Correlation:  an  Application  to  Computer  Vision. 
Technical  Report  9310-05,  I.R.S.T,  1993.  Submitted 
for  publication  to  Pattern  Recognition. 


[22]  P.  J.  Huber.  Robust  Statistics.  Wiley,  1981. 

[23]  P.  J.  Burt,  Smart  sensing  within  a  pyramid  vision 
machine.  Proceedings  of  the  IEEE,  76(8):1006-1015, 
1988. 

[24]  Alan  L.  Yuille.  Deformable  templates  for  face  recog¬ 
nition.  Journal  of  Cognitive  Neuroscience,  3(1):59- 
70,  1991. 


Figure  1:  The  facial  region  used  to  estimate  the  direction  of  illuminant.  Four  intensity  values  are  derived  by 
computing  a  weighted  average,  with  Gaussian  weights,  of  the  intensity  over  the  left  (right)  cheek  and  left  (right) 
forehead-eye  regions. 


Figure  2:  Superimposed  Gaussian  receptive  fields  giving  the  four  dimensional  input  of  the  HyperBF  network.  Each 
field  computes  a  weighted  average  of  the  intensity.  The  coordinates  of  the  plot  represent  the  image  plane  oordinates 
of  Figure  1 . 


Figure  3:  Computer  generated  images  (left)  are  used  to  modulate  the  intensity  of  a  single  view  under  approximately 
diffuse  illumination  (center)  to  produce  images  illuminated  from  different  angles  (right).  The  central  images  are 
obtained  by  replication  of  a  single  view.  The  right  images  are  obtained  by  multiplication  of  the  central  and  left 
images  (see  text  for  a  more  detailed  description). 
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Figure  4:  Error  made  by  a  4  units  HyperBF  network  on  estimating  the  illuminant  direction  on  the  169  images  of  the 
training  set.  The  horizontal  axis  represents  the  left-right  position  of  the  illuminant  while  the  vertical  axis  represents 
its  height.  The  intensity  of  the  squares  is  proportional  to  the  squared  error:  the  lighter  the  square,  the  greater  the 
error  is. 


Figure  5:  Some  real  images  on  which  the  algorithm  trained  on  the  synthetic  examples  has  been  applied. 


Computed  vs.  real  angle 


Real  (degrees) 


Figure  6:  Illuminant  direction  as  estimated  by  the  HyperBF  network  compared  to  the  real  data  for  the  four  test 
images. 


Figure  7:  The  first  three  images  represent  respectively  the  original  image,  the  image  obtained  by  fixing  the  nose  and 
mouth  position  to  that  of  the  reference  image  (the  last  in  the  row)  and  the  refined  warped  image  obtained  using  a 
hierarchical  optical  flow  algorithm. 


Figure  8:  The  original  image  (left)  and  the  image  corrected  using  the  procedure  described  in  the  text. 


Gradient  projection  vs.  neod  rototion 


Figure  9:  The  drawing  reports  the  dependence  of  the  gradient  projection  on  the  degrees  of  rotation  around  the 
vertical  image  axis.  The  projections  are  smoothed  using  a  Gaussian  kernel  of  o’  =  5.  Note  the  increasing  asymmetry 
of  the  two  peaks. 
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Rotation  vs.  asymmetry 


Figure  10:  The  drawing  reports  the  dependence  of  the  asymmetry  of  the  projection  peaks  on  the  degrees  of  rotation 
around  the  vertical  image  axis.  The  values  are  computed  by  averaging  the  data  from  three  different  people. 


Figure  11:  The  top  row  reports  the  test  images  while  the  bottom  row  shows  the  images  generated  using  a  simple  3D 
model  and  the  rotation  estimated  using  the  approximately  linear  dependence  of  the  gradient  projection  asymmetry 
on  the  rotation  around  the  vertical  image  axis 
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Sensitivity  to  illumination 
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Figure  12:  Sensitivity  to  illumination  of  some  common  preprocessing  operators.  The  abscissas  represent  the  values 
of  a  (see  text  for  an  explanation). 


